<?php include('common-header.inc'); ?>
	<title>ParseKit - Cocoa Objective-C Framework for parsing, tokenizing and language processing</title>
<?php include('common-nav.inc'); ?>
    	
<h1>ParseKit Documentation</h1>
<div id="content">
<h2 id="home-title">ParseKit</h2>
<p>ParseKit is a Mac OS X Framework written by Todd Ditchendorf in Objective-C and released under the Apache 2 Open Source License. ParseKit is suitable for use on Mac OS X Leopard and later or <a href="iphone.html">iOS</a>. ParseKit is an Objective-C is heavily influced by <a href="http://www.antlr.org/" title="ANTLR">ANTLR</a> by Terence Parr and <a href="http://www.amazon.com/Building-Parsers-Java-Steven-Metsker/dp/0201719622" title="Amazon.com: Building Parsers With Java(TM): Steven John Metsker: Books">"Building Parsers with Java"</a> by Steven John Metsker. Also, ParseKit depends on <a href="http://mattgemmell.com/2008/05/20/mgtemplateengine-templates-with-cocoa" title="MGTemplateEngine - Templates with Cocoa - Matt Gemmell">MGTemplateEngine</a> by Matt Gemmell for its templating features.</p>
<p>The ParseKit Framework offers 3 basic services of general interest to Cocoa developers:</p>

<ol>
<li><b><a href="/tokenization.html">String Tokenization</a></b> via the Objective-C <tt>PKTokenizer</tt> and <tt>PKToken</tt> classes.</li>
<li><b>High-Level Language Parsing via Objective-C</b> - An Objective-C parser-building API (the <tt>PKParser</tt> class and sublcasses).</li>
<li><b><a href="grammars.html"><a href="http://itod.github.io/ParseKitMiniMathExample/" title="ParseKit MiniMath Example by itod">Objective-C Parser Generation via Grammars</a></a></b> - Generate an Objective-C source code for parser for your custom language using a <a href="http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form">BNF</a>-style grammar syntax (similar to yacc or ANTLR). While parsing, the parser will provide callbacks to your Objective-C code.</li>
</ol>
<!-- <p>The ParseKit source code is available <a href="http://parsekit.googlecode.com/">on Google Code</a>. -->
<p>The ParseKit source code is available <a href="http://github.com/itod/parsekit/">on Github</a>.

<p>More documentation:</p>
<ul>
    <li><a href="iphone.html">Instructions for including ParseKit in your iOS app</a></li>
    <li><a href="http://stackoverflow.com/questions/9649537/how-to-embed-parsekit-as-a-private-framework-in-a-mac-app-bundle/9658158#9658158">Instructions for including ParseKit in your OS X app</a></li>
	<li><a href="doxygen/">Doxygen-generated Header Docs</a></li>
</ul>

<p>Projects using ParseKit:</p>
<ul>
	<li><a href="http://menial.co.uk/software/base/">Base</a>: Mac SQLite tool by Ben Barnett</li>
	<li><a href="http://www.hogbaysoftware.com/products/taskpaper_iphone">TaskPaper for iPhone</a>: Simple to-do lists app by Jesse Grosjean</li>
	<li><a href="http://worqshop.com/">Worqshop</a>: Development environment for iOS with GitHub support by Donny Kurniawan</li>
	<li><a href="http://github.com/ccgus/jstalk/tree/master">JSTalk</a>: Interprocess Cocoa scripting with JavaScript by Gus Mueller</li>
	<li><a href="http://lucidmac.com/products/spike">Spike</a>: A Rails log file viewer/analyzer by Matt Mower</li>
	<li><a href="https://github.com/lok/BayesianKit">BayesianKit</a>: A Cocoa framework implementing a bayesian classifier by Samuel Mendes</li>
	<li><a href="http://github.com/boucher/tdparsekit/tree/master">Objective-J Port</a> of ParseKit by Ross Boucher</li>
	<li><a href="http://tr.im/http">HTTP Client</a>: HTTP debugging/testing tool</li>
	<li><a href="http://fluidapp.com">Fluid</a>: Site-Specific Browser for Mac OS X</li>
	<li><a href="http://cruzapp.com">Cruz</a>: Social Browser for Mac OS X</li>
	<li><a href="http://fakeapp.com">Fake</a>: A Recordable/Automated Browser for Mac OS X</li>
	<li><a href="http://shapesapp.com">Shapes</a>: Simple, Elegant Diagramming tool for Mac OS X</li>
	<li><a href="http://parsekit.com/okudakit">OkudaKit</a>: Syntax Highlighting Framework for Mac OS X</li>
	<li><a href="http://tr.im/exedore">Exedore</a>: XPath 1.0 implemented in Cocoa (ported from <a href="http://saxonica.com/">Saxon</a>)</li>
</ul>

<h2>Xcode Project</h2>
<p>The ParseKit Xcode project consists of 6 targets:</p>

<ol>
<li><b>ParseKit</b> : the ParseKit Objective-C framework. The central feature/codebase of this project.</li>
<li><b>libParseKit</b> : the ParseKit Framework as a static library for Mac OS X applications.</li>
<li><b>libParseKitMobile</b> : the ParseKit Framework as a static library for iOS applications.</li>
<li><b>ParserGenApp</b> : a simple Mac app that can convert your ParseKit grammars into Objective-C parser source code.</li>
<li><b>Tests</b> : a UnitTest Bundle containing hundreds of unit tests (or more correctly, <i>interaction tests</i>) for the framework as well as some example classes that serve as real-world uses of the framework.</li>
<li><b>DemoApp</b> : a simple Cocoa demo app that gives a visual presentation of the results of tokenizing text using the PKTokenizer class.</li>
<li><b>DebugApp</b> : a simple Cocoa app that exists only to run arbitrary test code thru GDB with breakpoints for debugging (I was not able to do that with the UnitTest bundle).</li>
</ol>

<h2>ParseKit Framework</h2>

<hr/>
<h3>Tokenization</h3>

<p>The API for tokenization is provided by the <tt>PKTokenizer</tt> class. Cocoa developers will be familiar with the <tt>NSScanner</tt> class provided by the Foundation Framework which provides a similar service. However, the <tt>PKTokenizer</tt> class is simpler and more powerful for many use cases.</p>

<p>Example usage:</p>

<div class="code">
<pre>
NSString *s = @"\"It's 123 blast-off!\", she said, // watch out!\n"
              @"and &lt;= 3.5 'ticks' later /* wince */, it's blast-off!";
PKTokenizer *t = [PKTokenizer tokenizerWithString:s];

PKToken *eof = [PKToken EOFToken];
PKToken *tok = nil;

while ((tok = [t nextToken]) != eof) {
    NSLog(@" (%@)", tok);
}
</pre>
</div>

<p>outputs:</p>
<div class="code">
<pre> ("It's 123 blast-off!")
 (,)
 (she)
 (said)
 (,)
 (and)
 (&lt;=)
 (3.5)
 ('ticks')
 (later)
 (,)
 (it's)
 (blast-off)
 (!)
</pre>
</div>

<p>Each token produced is an object of class <tt>PKToken</tt>. <tt>PKToken</tt>s have a <tt>tokenType</tt> (<tt>Word</tt>, <tt>Symbol</tt>, <tt>Number</tt>, <tt>QuotedString</tt>, etc.) and both a <tt>stringValue</tt> and a <tt>floatValue</tt>.</p>

<p>More information about a token can be easily discovered using the <tt>-debugDescription</tt> method instead of the default <tt>-description</tt>. Replace the line containing <tt>NSLog</tt> above with this line:</p>

<div class="code">
<pre>
NSLog(@"%@", [tok debugDescription]);
</pre>
</div>

<p>and each token's type will be printed as well:</p>

<div class="code">
<pre> &lt;Quoted String &laquo;"It's 123 blast-off!"&raquo;>
 &lt;Symbol &laquo;,&raquo;>
 &lt;Word &laquo;she&raquo;>
 &lt;Word &laquo;said&raquo;>
 &lt;Symbol &laquo;,&raquo;>
 &lt;Word &laquo;and&raquo;>
 &lt;Symbol &laquo;&lt;=&raquo;>
 &lt;Number &laquo;3.5&raquo;>
 &lt;Quoted String &laquo;'ticks'&raquo;>
 &lt;Word &laquo;later&raquo;>
 &lt;Symbol &laquo;,&raquo;>
 &lt;Word &laquo;it's&raquo;>
 &lt;Word &laquo;blast-off&raquo;>
 &lt;Symbol &laquo;!&raquo;>
</pre>
</div>


<p>As you can see from the output, <tt>PKTokenzier</tt> is configured by default to properly group characters into tokens including:</p>

<ul>
<li>single- and double-quoted string tokens</li>
<li>common multiple character symbols (<tt>&lt;=</tt>)</li>
<li>apostrophes, dashes and other symbol chars that should not signal the start of a new Symbol token, but rather be included in the current Word or Number token (<tt>it's</tt>, <tt>blast-off</tt>, <tt>3.5</tt>)</li>	
<li>silently ignoring C- and C++-style comments</li>
<li>silently ignoring whitespace</li>
</ul>


<p>The PKTokenizer class is very flexible, and <b>all</b> of those features are configurable. PKTokenizer may be configured to:</p>

<ul>
<li>recognize more (or fewer) multi-char symbols. ex: <div class="code"><pre>[t.symbolState add:@"!="];</pre></div>
<p><small>allows <tt>!=</tt> to be recognized as a single <tt>Symbol</tt> token rather than two adjacent <tt>Symbol</tt> tokens</small></p>
</li>

<li>add new internal symbol chars to be included in the current <tt>Word</tt> token OR recognize internal symbols like apostrophe and dash to actually signal a new <tt>Symbol</tt> token rather than being part of the current Word token. ex: 
<div class="code"><pre>[t.wordState setWordChars:YES from:'_' to:'_'];</pre></div>
<p><small>allows Word tokens to contain internal underscores</small></p>	
<div class="code"><pre>[t.wordState setWordChars:NO from:'-' to:'-'];</pre></div>
<p><small>disallows Word tokens from containing internal dashes.</small></p>	
</li>
	
	
<li>change which chars signal the start of a token of any given type. e.g.:
<div class="code"><pre>[t setTokenizerState:t.wordState from:'_' to:'_'];</pre></div>
<p><small>allows Word tokens to start with underscore</small></p>
<div class="code"><pre>[t setTokenizerState:t.quoteState from:'*' to:'*'];</pre></div>
<p><small>allows Quoted String tokens to start with an asterisk, effectively making <tt>*</tt> a new quote symbol (like <tt>"</tt> or <tt>'</tt>)</small></p>
</li>

<li>turn off recognition of single-line "slash-slash" (<tt>//</tt>) comments. ex: 
<div class="code"><pre>[t setTokenizerState:t.symbolState from:'/' to:'/'];</pre></div>
<p><small>slash chars now produce individual Symbol tokens rather than causing the tokenizer to strip text until the next newline char or begin striping for a multiline comment if appropriate (<tt>/*</tt>)</small></p>
</li>

<li>turn on recognition of "hash" (<tt>#</tt>) single-line comments. ex: 
<div class="code">
<pre>[t setTokenizerState:t.commentState from:'#' to:'#'];
[t.commentState addSingleLineStartSymbol:@"#"];</pre>
</div>
</li>

<li>turn on recognition of "XML/HTML" (<tt>&lt;!-- --></tt>) multi-line comments. ex: 
<div class="code">
<pre>[t setTokenizerState:t.commentState from:'&lt;' to:'&lt;'];
[t.commentState addMultiLineStartSymbol:@"&lt;!--" endSymbol:@"-->"];</pre>
</div>
</li>

<li>report (rather than silently consume) Comment tokens. ex: 
<div class="code">
<pre>t.commentState.reportsCommentTokens = YES; // default is NO</pre>
</div>
</li>

<li>report (rather than silently consume) Whitespace tokens. ex: 
<div class="code">
<pre>t.whitespaceState.reportsWhitespaceTokens = YES; // default is NO</pre>
</div>
</li>

<li>turn on recognition of any characters (say, digits) as whitespace to be silently ignored. ex: 
<div class="code">
<pre>[t setTokenizerState:t.whitespaceState from:'0' to:'9'];</pre>
</div>
</li>

</ul>

<hr/>
<h3>Parsing</h3>

<p>ParseKit also includes a collection of token parser subclasses (of the abstract <tt>PKParser</tt> class) including collection parsers such as <tt>PKAlternation</tt>, <tt>PKSequence</tt>, and <tt>PKRepetition</tt> as well as terminal parsers including <tt>PKWord</tt>, <tt>PKNum</tt>, <tt>PKSymbol</tt>, <tt>PKQuotedString</tt>, etc. Also included are parser subclasses which work in individual chars such as <tt>PKChar</tt>, <tt>PKDigit</tt>, and <tt>PKSpecificChar</tt>. These char parsers are useful for things like RegEx parsing. Generally speaking though, the token parsers will be more useful and interesting.</p>

<p>The parser classes represent a <b>Composite</b> pattern. Programs can build a composite parser, in <b>Objective-C</b> (rather than a separate language like with lex&amp;yacc), from a collection of terminal parsers composed into alternations, sequences, and repetitions to represent an infinite number of languages.</p>

<p>Parsers built from ParseKit are <b>non-deterministic, recursive descent parsers</b>, which basically means they trade some performance for ease of user programming and simplicity of implementation.</p>

<p>Here is an example of how one might build a parser for a simple voice-search command language (note: ParseKit does not include any kind of speech recognition technology). The language consists of:</p>

<div class="code">
<pre>search google for? &lt;search-term&gt;</pre>
</div>

<div class="code">
<pre>
...

	[self parseString:@"search google 'iphone'"];
...
	
- (void)parseString:(NSString *)s {
	PKSequence *parser = [PKSequence sequence];

	[parser add:[[PKLiteral literalWithString:@"search"] discard]];
	[parser add:[[PKLiteral literalWithString:@"google"] discard]];

	PKAlternation *optionalFor = [PKAlternation alternation];
	[optionalFor add:[PKEmpty empty]];
	[optionalFor add:[PKLiteral literalWithString:@"for"]];

	[parser add:[optionalFor discard]];

	PKParser *searchTerm = [PKQuotedString quotedString];
	[searchTerm setAssembler:self selector:@selector(workOnSearchTermAssembly:)];
	[parser add:searchTerm];

	PKAssembly *result = [parser bestMatchFor:[PKTokenAssembly assmeblyWithString:s]];
	
	NSLog(@" %@", result);

	// output:
	//  ['iphone']search/google/'iphone'^
}

...

- (void)workOnSearchTermAssembly:(PKAssembly *)a {
	PKToken *t = [a pop]; // a QuotedString token with a stringValue of 'iphone'
	[self doGoogleSearchForTerm:t.stringValue];
}

</pre>
</div>

</div>

<?php include('common-footer.inc'); ?>
